Skip to content

Conversation

@ngquangtrung57
Copy link
Collaborator

Motivation

Support Qwen 2.5 Omni

Modifications

  • Add Qwen 2.5 Omni
  • Add video backend: qwen_omni_utils

Checklist

  • Format your code
  • Add unit tests
  • Update documentation as needed, including docstrings or example tutorials.

@ngquangtrung57 ngquangtrung57 requested a review from kcz358 October 2, 2025 18:46
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably refactor this changes into a new qwen omni dataset that inherit from this and override and keep vision_audio_dataset unchanged

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can override this logic in qwen2_5_omni_processor and keep this file unchange

Comment on lines 72 to 91
# for Qwen2.5-Omni, use ThinkerForConditionalGeneration
if config.model_type == "qwen2_5_omni":
from transformers.models.qwen2_5_omni.modeling_qwen2_5_omni import (
Qwen2_5OmniThinkerForConditionalGeneration,
)

class Qwen2_5OmniThinkerForConditionalGenerationWithDtype(
Qwen2_5OmniThinkerForConditionalGeneration
):
@classmethod
def from_pretrained(cls, *args, **kwargs):
if "torch_dtype" not in kwargs:
kwargs["torch_dtype"] = "auto"
model = super().from_pretrained(*args, **kwargs)
import torch

model = model.to(dtype=torch.bfloat16)
return model

return Qwen2_5OmniThinkerForConditionalGenerationWithDtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2_5OmniThinkerForConditionalGeneration itself is a PretrainedModel and can use from_pretrained. This part is redundant. Register it as an auto causal lm and remove this

Comment on lines +97 to +106
if rms_norm:
modeling_qwen2_5_omni.Qwen2RMSNorm = LigerRMSNorm
if cross_entropy:
modeling_qwen2_5_omni.CrossEntropyLoss = LigerCrossEntropyLoss
if fused_linear_cross_entropy:
modeling_qwen2_5_omni.Qwen2_5OmniThinkerForConditionalGeneration.forward = (
qwen2_5_omni_lce_forward
)
if swiglu:
modeling_qwen2_5_omni.Qwen2MLP = LigerSwiGLUMLP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2.5 Omni don't have Qwen2RMSNorm or Qwen2MLP. This won't patch the correct module. Check https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py for correct names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@kcz358 kcz358 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I scan to this https://github.com/huggingface/transformers/blob/0419ff881d7bb503f4fc0f0a7a5aac3d012c9b91/src/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py#L976-L987. Not sure if it can also be patched. Yeah but should be fined to patch these two with Qwen2RMSNorm and Qwen2MLP. Should be fine for now, no need to change for this part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not actually remove padding from the ops right now. Input ids will be 2D and I suspect you are directly using the Qwen2.5 VL 's forward? This could be buggy

Comment on lines 75 to 78
elif config.model_type == "qwen2_5_omni":
self.config = (
config.text_config if hasattr(config, "text_config") else config
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If already reset config here, actually no need to rewrite the _estimate_qwen2_flops later

@ngquangtrung57
Copy link
Collaborator Author

  • Most rmpad logic falls under liger file not the ops file because Qwen Omni doesn't have intermediate model layer.
  • I have some issues with the register and loading of Thinker only, so i still modify the mapping_func file.
  • I have fixed the issue with tmrope that caused error with sp i mentioned, but currently with my testing using sp=2 on 8 GPUs, it ran successfully the first step. Then at the second step, it got nccl timeout due to AlltoAll operation. This could be my server issue because i have this same error for different projects, when the GPUs perform AllReduce or Allgather.

@ngquangtrung57 ngquangtrung57 requested a review from kcz358 October 12, 2025 17:22
Comment on lines 72 to 94

# for Qwen2.5-Omni, load only the thinker model (not full model with talker), hence need handle differently otherwise it will load the full model
if hasattr(config, "model_type") and config.model_type == "qwen2_5_omni":

class Qwen2_5OmniThinkerLoader:
@staticmethod
def from_pretrained(pretrained_model_name_or_path, *args, **kwargs):
# Load the full model first
full_model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path, *args, **kwargs
)
# Extract and return only the thinker
thinker_model = full_model.thinker
# Clean up full model to save memory
if hasattr(full_model, "talker"):
del full_model.talker
if hasattr(full_model, "token2wav"):
del full_model.token2wav
del full_model
return thinker_model

return Qwen2_5OmniThinkerLoader

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be removed

Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think most of the parts look fine right now. A few suggestion that can improve this PR further. I see that we are only training the Thinker part so I think you can wrap the actual training model into a PretrainedModel and save if as a pretrained checkpoint. Then you can simply use from_pretrained using that and rename the model into qwen2_5_omni_thinker (can push the converted checkpoint to hub also, in lmms-lab maybe)

@kcz358
Copy link
Collaborator

kcz358 commented Oct 15, 2025

To clarify more, my feeling is that you can actually dump away all the talker module. You can do it as follow

You can see here that thinker module itself is a pretrainedmodel and has its own model type
https://github.com/huggingface/transformers/blob/94df0e65602922be2831b3faa457a2bde78b936b/src/transformers/models/qwen2_5_omni/configuration_qwen2_5_omni.py#L413-L484

So you can

  1. First load pretrained from original Qwen2.5-Omni checkpoints
  2. Extract the thinker module (Can be like model.thinker, model.thinker.thinker_config)
  3. Save the pretrained thinker checkpoints, config, and preprocessor using save_pretrained
  4. In the init part, register the thinker module as an AutoModelForCausalLM (it is not registered by the transformers right?)
  5. Load from your pretrained checkpoints, processor, and config so that we have a very clean mapping func and register
  6. [Optional] Push the checkpoints to the hub so that everyone can use it

You might need to change your patching logic accordingly though, because the model structure changes a bit

@kcz358
Copy link
Collaborator

kcz358 commented Oct 15, 2025

  • Most rmpad logic falls under liger file not the ops file because Qwen Omni doesn't have intermediate model layer.
  • I have some issues with the register and loading of Thinker only, so i still modify the mapping_func file.
  • I have fixed the issue with tmrope that caused error with sp i mentioned, but currently with my testing using sp=2 on 8 GPUs, it ran successfully the first step. Then at the second step, it got nccl timeout due to AlltoAll operation. This could be my server issue because i have this same error for different projects, when the GPUs perform AllReduce or Allgather.

For the sp alltoall ops, this could possibly caused by the tensor shape mismatch during the gather operations so it is stucked there. May need to check the splitted size on each rank and validate whether the gather size is being set correctly

ngquangtrung57 and others added 3 commits October 17, 2025 15:04
Removed comment about using exist_ok=True in AutoConfig registration.
@ngquangtrung57
Copy link
Collaborator Author

I have extracted the thinker only and pushed it here "ngqtrung/Qwen2.5-Omni-Thinker-7B". When i move it to lmms lab hub, it show "The storage patterns of org lmms-lab tripped our internal systems! Please contact us at [email protected] so we can verify your account and unlock more storage for your use-case." so currently, i keep it in my hub.
I have checked the tensor shape of AlltoAll op and it looks correct. also the hang happens after lce forward(after compte the loss) not when performing "gather_heads_scatter_seq". i think this could be related to trainer or as i said, my server itself.

@ngquangtrung57 ngquangtrung57 requested a review from kcz358 October 17, 2025 07:14
@kcz358
Copy link
Collaborator

kcz358 commented Oct 20, 2025

Hi trung, thank you for all the efforts. I think the most of the part looks nice and LGTM. Can you resolve the conflict and change the example config into the new format? Since we are using hydra now, we can either choose to use a config yaml or simply overriding args with cli. Can check the examples/qwen3_vl for detail information. Thanks!

@ngquangtrung57 ngquangtrung57 merged commit 25aecdf into main Oct 20, 2025
2 checks passed
@kcz358
Copy link
Collaborator

kcz358 commented Oct 20, 2025

Please don't use merge because it will merge every commits into the main branch and causing a non-linear history. Instead use squash and merge. I will revert this PR for now and re PR this feature

@kcz358
Copy link
Collaborator

kcz358 commented Oct 20, 2025

Oh, actually on my testing envs, this also breaks some of my cicd in some cases. Let me see if we can fix it together in a new PR

@ngquangtrung57
Copy link
Collaborator Author

Sure, i can help with the fixing. What the current issue?

@kcz358
Copy link
Collaborator

kcz358 commented Oct 20, 2025

It's okay, I fix the import error already

@kcz358 kcz358 mentioned this pull request Oct 20, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants